Experiments with an innovative tree pruning algorithm
نویسندگان
چکیده
The pruning phase is one of the necessary steps in decision tree induction. Existing pruning algorithms tend to have some or all of the following difficulties: 1) lack of theoretical support; 2) high computational complexity; 3) dependence on validation; 4) complicated implementation. The 2-norm pruning algorithm proposed here addresses all of the above difficulties. This paper demonstrates the experimental results of the comparison among the 2-norm pruning algorithm and two classical pruning algorithms, the Minimal Cost-Complexity algorithm (used in CART) and the Error-based pruning algorithm (used in C4.5), and confirms that the 2-norm pruning algorithm is superior in accuracy and speed.
منابع مشابه
The Biases of Decision Tree Pruning Strategies
Post pruning of decision trees has been a successful approach in many real-world experiments, but over all possible concepts it does not bring any inherent improvement to an algorithm's performance. This work explores how a PAC-proven decision tree learning algorithm fares in comparison with two variants of the normal top-down induction of decision trees. The algorithm does not prune its hypoth...
متن کاملEvaluation of liquefaction potential based on CPT results using C4.5 decision tree
The prediction of liquefaction potential of soil due to an earthquake is an essential task in Civil Engineering. The decision tree is a tree structure consisting of internal and terminal nodes which process the data to ultimately yield a classification. C4.5 is a known algorithm widely used to design decision trees. In this algorithm, a pruning process is carried out to solve the problem of the...
متن کاملCC4.5: cost-sensitive decision tree pruning
There are many methods to prune decision trees, but the idea of cost-sensitive pruning has received much less investigation even though additional flexibility and increased performance can be obtained from this method. In this paper, we introduce a cost-sensitive decision tree pruning algorithm called CC4.5 based on the C4.5 algorithm. This algorithm uses the same method as C4.5 to construct th...
متن کاملStudy of Various Decision Tree Pruning Methods with their Empirical Comparison in WEKA
Classification is important problem in data mining. Given a data set, classifier generates meaningful description for each class. Decision trees are most effective and widely used classification methods. There are several algorithms for induction of decision trees. These trees are first induced and then prune subtrees with subsequent pruning phase to improve accuracy and prevent overfitting. In...
متن کاملAdjusting for Multiple Comparisons in Decision Tree Pruning
Introduction Bonferroni Pruning Pruning is a common technique to avoid over tting in decision trees. Most pruning techniques do not account for one important factor | multiple comparisons. Multiple comparisons occur when an induction algorithm examines several candidate models and selects the one that best accords with the data. Making multiple comparisons produces incorrect inferences about mo...
متن کامل